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Abstract — The integration of various types of genomic data into predictive models of biological 
networks is one of the main challenges currently faced by computational biology. Constraint-based 
models in particular play a key role in the attempt to obtain a quantitative understanding of 
cellular metabolism at genome scale. In essence, their goal is to frame the metabolic capabilities 
of an organism based on minimal assumptions that describe the steady states of the underlying 
reaction network via suitable stoichiometric constraints, specifically mass balance and energy 
balance (i.e. thermodynamic feasibility). The implementation of these requirements to generate 
viable configurations of reaction fluxes and/or to test given flux profiles for thermodynamic 
feasibility can however prove to be computationally intensive. We propose here a fast and scalable 
stoichiometry-based method to explore the Gibbs energy landscape of a biochemical network at 
steady state. The method is applied to the problem of reconstructing the Gibbs energy landscape 
underlying metabolic activity in the human red blood cell, and to that of identifying and removing 
thermodynamically infeasible reaction cycles in the Escherichia coli metabolic network (iAF1260). 
In the former case, we produce consistent predictions for chemical potentials (or log-concentrations) 
of intracellular metabolites; in the latter, we identify a restricted set of loops (23 in total) in the 
periplasmic and cytoplasmic core as the origin of thermodynamic infeasibility in a large sample 
(10 6 ) of flux configurations generated randomly and compatibly with the prior information available 
on reaction reversibility. 

Author Summary — The operation of biological systems is constrained under all circum- 
stances by the laws of physics. Thermodynamics, in particular, dictates preferential directions in 
which biochemical reactions should flow at stationarity. When applied to cellular reaction systems 
(like metabolic networks), it favors the emergence of some (thermodynamically feasible) ways to 
organize the flow of matter while prohibiting others. The development of detailed predictive models 
for the biochemical activity of a cell relies on the possibility to integrate the laws of thermodynamics 
in genome-scale reconstructions of cellular metabolic networks. In this work we have devised an 
efficient relaxation algorithm to implement thermodynamic constraints in genome-scale models. 
Besides allowing to check for thermodynamic feasibility of reaction flow configurations, it is also 
capable of providing information on other relevant physico-chemical quantities. We have applied it 
to two cellular metabolic networks of different complexity, namely that of human red blood cells 
and that of the bacterium Escherichia coli. In the former case, we have obtained predictions for 
the intracellular chemical state (in terms of metabolite concentrations and reaction free energies) 
consistent with empirical knowledge; in the latter, we have effectively corrected thermodynamically 
infeasible flux configurations. 



INTRODUCTION 



Constraint-based models of cellular metabolism are im- 
portant tools to analyze and predict the chemical activ- 
ity and response to perturbations of cells without relying 
on kinetic details that are often unavailable. In such 
frameworks, the metabolic capabilities of a cell are in- 
ferred from the overall configuration space compatible 
with minimal physico-chemical constraints describing the 
non-equilibrium steady state of the underlying reaction 
network. First, feasible reaction flux vectors need to sat- 
isfy mass-balance conditions. Then, according to the sec- 
ond law of thermodynamics, in an open chemical network 
at steady state and constant temperature and pressure 
the direction of each reaction should ensure a decrease in 
Gibbs energy. Thermodynamic consistency of flux config- 
urations satisfying mass-balance alone is in general not 



guaranteed due to the presence of infeasible cycles [T|- 
[3], even if reaction reversibility is pre-assigned based on 
careful estimations of chemical potentials in physiologic 
conditions [4] (a procedure that was recently extended 
to genome-scale |6]). Besides the flux organization, 
several other aspects involved in the analysis of genome- 
scale metabolic networks hinge directly on the explicit 
inclusion of thermodynamic constraints into the models, 
like the estimation of metabolite concentrations or the 
identification of reactions subject to regulation [7]. 

Much work has been concerned with implementing 
thermodynamic constraints in genome-scale models of 
metabolism. The removal of thermodynamic inconsisten- 
cies was proved to be useful in estimating concentrations 
and affinities besides fluxes in Flux-Balance-Analysis 
[H |9] , whose goal is to identify mass-balanced flux config- 
urations maximizing a pre-determined physiological ob- 
jective function [101 HI]- This has been achieved for 
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instance by building mixed integer-linear or non-linear 
optimization problems that minimize the Euclidean dis- 
tance of concentration levels from experimentally known 
values [T2] , or ensure the absence of cycles in the result- 
ing flux pattern [13-15] . On the other hand, informa- 
tion on feasible Gibbs energy ranges can be retrieved by 
exploiting the patterns of reaction interconnections en- 
coded in the stoichiometry to narrow the experimental 
bounds [7 . These procedures may however require re- 
liable prior thermodynamic information on metabolites 
and/or reactions, a type of knowledge that is often un- 
available. An important lesson of this kind of approaches 
is that the key input for the thermodynamic profiling of a 
reaction network is often provided by the stoichiometric 
matrix [16| . 

The scalability of algorithms to solve mixed integer- 
linear (or non-linear) programming problems may be- 
come an issue when the underlying network size is large 
or when one is interested in sampling the solution space 
(for both free energies and fluxes) rather than focusing 
on a potentially small set of configurations (e.g. op- 
tima). Luckily however, solutions to computationally 
hard problems can often be generated efficiently with the 
help of heuristic algorithms based on simple local rules. 
The use of message-passing algorithms to characterize 
the high-dimensional volume of the solution space of FB A 
models |17j (with a convex, continuous solution space) or 
to solve large combinatorial constraint-satisfaction prob- 
lems |18) (with a discrete and possibly fragmented solu- 
tion space) is an example of the success of this kind of 
strategy. 

Our goal in this paper is to obtain information about 
the landscape of Gibbs free energies compatible with a 
given vector of reaction directions by following a route 
that allows to use all stoichiometric information via 
heuristics inspired by perceptron learning. In a nut- 
shell, the method we propose consists in exploiting the 
network's structure to iterativcly build up correlations 
between the chemical potentials of the reacting species 
starting from a seed of empirical biochemical knowledge, 
until a thermodynamically consistent profile is achieved. 
The resulting algorithm is completely scalable and can 
be employed for different purposes, like checking the fea- 
sibility of flux configurations, identifying and removing 
infeasible cycles, estimating reaction affinities, and ob- 
taining bounds for (log-)concentrations and free energies 
of formation. In the following, we describe the method 
in detail, providing a mathematical proof of convergence 
as well as theoretical arguments highlighting the main 
idea behind the procedure. As applications, we focus 
on two metabolic networks of rather different complex- 
ity. First, we shall obtain a detailed reconstruction of 
the Gibbs energy landscape underlying metabolic activ- 
ity in the human red blood cell (hRBC) starting from the 
flux maps obtained in [T9] [20] . Then, the metabolic net- 
work of Escherichia coli, iAF1260 [21 j . will be analyzed 
to eliminate infeasible cycles from randomly generated 
flux configurations. 



MATERIALS AND METHODS 
Materials 

The cellular systems analyzed in this study are (i) the 
model of the hRBC metabolism developed in[55] and 
discussed in [T3], and (ii) the reconstructed metabolic 
network of the bacterium Escherichia coli iAF1260, pre- 
sented in [2T] . The former consists of 35 intracellular re- 
actions among 39 metabolites subject to 12 uptake fluxes. 
The latter includes 2381 reactions among 1039 metabo- 
lites. The basic information extracted from these mod- 
els is the M x N stoichiometric matrix (with M and N 
the number of metabolites and reactions, respectively), 
denoted below as S. For Escherichia coli, we will con- 
sider the inner matrix of periplasmic and cytoplasmic 
reactions without repetitions, which consists of 1767 re- 
actions among 1349 chemical species, once periplasmic 
and cytoplasmic metabolites are distinguished. Accord- 
ing to the reversibility assignment given in [21], 1475 of 
the 1767 processes above are unidirectional, with 292 
being reversible. The biochemical data we shall refer 
to or make use of include the standard free energies of 
formation of metabolites, given in [23] . where they are 
computed at T = 298 K , P = 1 atm, pH = 7.6 and 
ionic strenght 0.15 M, according to the prescriptions of 
[4]. The estimated intracellular concentration ranges for 
the hRBC were extracted from the Bionumbers database 
[2"4] and refer to measurements in different settings. It 
is worth to notice that the experimental errors on such 
values reflect the intrinsic uncertainty due to statistical 
cell-to-cell fluctuations (since measurements of concen- 
tration levels are usually carried out by averaging over 
numbers of cells ranging from 10 2 to 10 8 ) and analyti- 
cal error. The stoichiometric matrix and thermodynamic 
potentials employed for the analysis of the hRBC are re- 
spectively available as Supporting Datasets SI and S2. 

Methods 

Algorithm to compute chemical potentials 

According to the second law of thermodynamics, in an 
open system at constant temperature T and pressure P 
the Gibbs energy G = E~PV-TS (where E, V, S are 
respectively the energy, volume and entropy of the sys- 
tem) never increases spontaneously. This means that the 
direction it, 6 {—1, 1} (+1 for forward, —1 for backward) 
of every chemical reaction i G {1, . . . , N} occurring in the 
system should be opposite to the Gibbs energy change 
AGi induced by reaction i, i.e. 

UiAGi < Vi . (1) 

The equality holds if reaction i is in equilibrium. De- 
noting by S a j the stoichiometric coefficient of reactant 
a G {1, . . . , M} in reaction i, with the standard sign con- 



3 



vention to distinguish substrates (S a: i < 0) from prod- 
ucts (S at i > 0), the vector of AGVs for a well- mixed 
system can be written in terms of the chemical poten- 
tials fi — {^a} (where /j, a is the Gibbs energy per mole 
of species a € {1, ... , M}) as (see e.g. [I], Ch. 1) 



Initialization 



AG 



S T fi 



(2) 



Given flux vectors v such that Sv = (i.e. steady-state 
flux configurations), equation ^ implies that v AG = 0, 
i.e. that the 'loop law' holds. The Gibbs energy land- 
scape reconstruction problem consists, given a vector 
u = {ui} of reaction directions (u, G { — 1,1}), in gen- 
erating vectors \x that satisfy the system of linear in- 
equalities 



M 

E 



(3) 



Note that the solution space of ([3| for fixed directions 
is convex (while non-convexity can arise if directions are 
allowed to vary). Relaxation methods (see e.g. [25], Ch. 
12, or are among the most effective procedures to 
find solutions of systems such as pi. These techniques 
date back at least to the Jacobi method for solving sys- 
tems of linear equalities, and were extended to inequali- 
ties in the 1950's [37J[25]. I n essence, they are iterative 
methods in which variables are updated so that at every 
iteration one of the violated inequalities is fixed. While 
this readjusts the entire vector without a guarantee that 
constraints that were previously satisfied will be broken, 
convergence to a solution (if it exists) is guaranteed if the 
update step is chosen wisely. We shall employ a relax- 
ation algorithm known as MinOver, which was developed 
in the context of neural network learning [29] . and has 
been employed, in a slightly modified form |30j . to ex- 
plore the space of flux states compatible with minimal 
stability constraints a la Von Neumann [3TJ EH] ■ Figure 
1 displays a flowchart of the procedure for the present 
case. One starts from a 'trial' probability distribution 
Po(fi) of chemical potential vectors. Its role for the mo- 
ment is simply that of initializing the algorithm, which is 
done by generating a random vector fx under Po (/-*)• For 
simplicity, one may think that Po(fj) = Ila=i -^(aO, 
with prescribed distributions Pff, e.g. uniform over a 
given interval: in this case each initial [i a is selected ran- 
domly and independently from its trial distribution Pq. 
On the other hand, Pff might contain prior biochemical 
information, e.g. by being a uniform distribution cen- 
tered around the known experimental values of fi a and 
of sufficiently large width to span several orders of mag- 
nitude in concentrations. (The precise construction of 
Po(a*) f° r our case studies is discussed below.) The algo- 
rithm is based on the following steps: 

1. Generate a chemical potential vector fi = {/i Q } 
from Pq((i). 

2. Compute x = {xi} from ^ and i a = arg minj Xi 
(i.e., iq is the index of the least satisfied constraint). 



A 'trial' probability distribution 
for M, Po(M) 



Input data 



Stoichiometric matrix, S 
Vector of reaction directions, u 



u = vector of chemical potentials 



/ O Generate |J from Po(n) 



Compute vector x={xi}, see (3) 



© update |J. see (4) 




O u is thermodynamically feasible 
u is a vector of feasible chemical potentials 



FIG. 1. Flowchart of the algorithm. Given a stoichio- 
metric matrix S and a generic vector u of reaction directions, 
the algorithm generates a vector of chemical potentials if u 
is thermodynamically feasible, u may for instance be taken 
from a steady state flux configuration. 



3. If Xi > then fi is a thermodynamically consistent 
chemical potential vector for u; exit (or go to 1 to 
obtain a different solution). 



4. If x^ < 0, update \x as 



fi - Xu io Si 



(4) 



(where A > is a constant and Sj is the j-th col- 
umn of matrix S), go to 2 and iterate. 

As is generally true in MinOver schemes, the reinforce- 
ment term in Q drives the gradual adjustment of chem- 
ical potentials by ensuring that, at every iteration, the 
least satisfied constraint (labeled io) is improved. Con- 
vergence to a solution, if one exists, is guaranteed for any 
a > 0. To see it, suppose that a vector /x* exists, such 
that 



Ui(Si ■ fx*) > c Vi 



(5) 



with c > a constant (in other words, /x* is a solution of 
Q). From Eq. Q, the chemical potential vector at the 
£-th iteration step, fi(i), satisfies 

H(£) ■ (j,* = - 1) • M* - Xaioy-viSioii-i) ■ M*) 

> n(e - 1) • + Ac 

> n(0) ■ + £Xc , (6) 

where zo(^) is the value taken by io at step £. Similarly, 
one finds 

Mf)-/xW<Mo)-M0) + « 2 i, (7) 

with A = maxi X^^t") 2 ■ ^ s a consequence, 

m - > M0) ^ +£Ac (8) 

However by the Cauchy-Swartz inequality d(£) < 1, and 
an upper bound for the number of steps can be obtained 
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from d(£ c ) = 1 which to leading orders in 1/c and 1/A 
gives 



where <5i > 0, 82 > and S3 are constants proportional 
to A. Hence starting from an initial random vector of 
chemical potentials the algorithm is able to obtain a new 
vector ensuring that ^ is satisfied. Re-initializing the 
algorithm from a different random vector sampled from 
Po(a*) allows to retrieve another solution and in turn ex- 
plore the space of /x's satisfying 

In essence, the final outcome of multiple (random) ini- 
tializations of the above algorithm is a set of correlated 
probability distributions for the i/ a 's (at odds with the 
P (/■*), which was assumed to be a product measure, so 
that no correlations were present initially). The algo- 
rithmic origin of such interdependencies can be under- 
stood considering that chemical potentials are updated 
dynamically through a series of reinforcement steps of 
the form XuiS a ^. It follows that the final value of /i a 
can be written as fj, a = ^ +«Xi t hiS a ,i, where ^ is the 
trial chemical potential sampled from Pq (i.e. the initial 
value of fj, a ) and hi G {0, 1} is an index which is updated 
(increased or decreased by one according to the sign of 
the reaction) each time reaction i tries to invert. The 
connected correlations (/x Q A^)e = (^afJ-p) — (^a)(^/3) be- 
tween chemical potentials (where (• • • ) is an average over 
all possible choices of the initial conditions) can thus be 
decomposed as 

N 

{^aHl3}c = Saf3&a + A Sg t j (fl^hj) c + 
»=1 

JV 

+ \J2SpA^h t ) c + \ 2 J2 S <*>* S M( h ihj)c , (10) 
i=l i,j 

where a 2 a is the variance of Pq and 5 a p = 1 if a = ft and 
= otherwise (so that (MaM^a)c = Sapcr 2 ). In the Gaus- 
sian approximation for Su,ihi, the leading term (of 
O(N)) in the above sum is 

N 

X 2 J2 S ^iS^l (<*¥>P)- (11) 

i=l 

Therefore to leading order, the dynamics tends to cor- 
relate (resp. anti-correlate) the chemical potentials of 
metabolites typically appearing on the same (resp. op- 
posite) side of the reaction equations. In a sense, the 
above scheme allows to modify Pq by building up cor- 
relations between chemical potentials according to the 
interconnections encoded in S. Note that, at odds with 
the method proposed in [7J, the resulting /j, a 's can exceed 
the initial bounds defined by -Po (/-*)■ 

If there is no prior information on the direction of some 
reactions (e.g. because they are putatively reversible), 
the corresponding constraints (I3J) are formally absent, as 



if Ui — 0. However, the above method still allows to 
retrieve information about the chemical potential of a 
metabolite involved in them, provided it is not employed 
in reversible processes only, in which case its fi a clearly 
is never updated. 

Finally, some observations are in order about the so- 
lution space of ([3]), which in general has the form of an 
unbounded cone passing trough the origin. If one is in- 
terested in uniform sampling the space of fi's making S 
thermodynamically feasible, boundedness is an essential 
precondition. The simplest way to obtain a bounded so- 
lution space consist in clamping some /z a 's, i.e. in keeping 
them fixed at definite values throughout the updating. 
Note that fixing some /x^'s is also crucial to set a scale for 
chemical potentials. The same effect can be achieved by 
assigning hard ranges of variability for potentials in the 
form of bounds like /i™ m < fi a _• M™ ax ( e -g- according to 
experimental or biochemical information) . Such inequal- 
ities can simply be added to the list of thermodynamic 
constraints ^ . Alternatively, one may add other types of 
global, non-homogeneous constraints bearing a physical 
justification. For instance, if uptake fluxes are included 
in S, the chemical potentials of the external metabolites 
should be fixed; or, volume constraints for the feasible 
ranges of concentrations could be added if the standard 
free energies of formation are known. Once this is done, 
the system ^ becomes inhomogeneous and its solution 
space is formed by the union of a convex polyhedron and 
a cone; boundedness can be achieved if and only if the 
cone shrinks to the null vector, which occurs if the related 
homogeneous system of equations has no solutions apart 
from the trivial one, fj, a = V/z [33 . In synthesis, one 
can obtain a bounded solution space for ^ by clamping 
the chemical potential of a sufficient number of metabo- 
lites to ensure that the homogeneous system of equalities 
associated to the inhomogeneous system of inequalities 
obtained by clamping some fi a 's in (|3| admits the null 
vector as its only solution. 

A number of interesting theoretical and computational 
questions arise at this stage, regarding e.g. the mini- 
mal amount of prior information on chemical potentials 
needed to bound the solution space of ([3]), or computa- 
tionally efficient and scalable methods to obtain uniform 
sampling (besides Monte Carlo, which may be infeasi- 
ble at high dimensions as suggested by the "curse of di- 
mensionality" , see e.g. |34|). To our knowledge, there 
is no mathematical proof that MinOver schemes are ca- 
pable of sampling a bounded solution space uniformly, 
although low dimensional tests suggest that this might 
indeed be the case [31 j - Our goal in the present paper, 
rather than uniformly sampling the space of chemical po- 
tentials granting feasibility, is that of exploring feasible 
configurations "close" to the prior biochemical informa- 
tion we have injected. To quantify this idea, we have 
explicitly compared the solutions obtained via the above 
procedure with those retrieved from the minimization of 
a cost function (the average Euclidean distance d between 
the prior and the solution) and by a standard relaxation 



5 



method (see Results: Exploring the Gibbs energy land- 
scape of the hRBC metabolic network). The heuristics 
we present indeed turns out to roughly minimize d, with 
the advantage of being considerably faster than a penalty 
method. In addition, it allows to access refined informa- 
tion on the Gibbs energy landscape (i.e. compute chem- 
ical potentials and Gibbs energy changes) even when the 
initialization is complemented by a noisy or inconsistent 
biochemical prior, since the algorithm correctly identi- 
fies and gradually removes inconsistencies. The solutions 
thus obtained identify a restricted and statistically well- 
behaved set of chemical potentials with physiological sig- 
nificance. We are currently unable to go beyond this 
point. In addition, we shall sec that by the same method 
one can verify the thermodynamic consistency of, and 
eventually adjust, specific flux configurations of biochem- 
ical reaction networks. 



Algorithm to identify and remove loops 

The algorithm just discussed generates chemical poten- 
tial vectors given a thermodynamically feasible vector of 
reaction directions. A generic assignment of reaction di- 
rections, however, could be such that the system Q has 
no solutions apart from the trivial one. In accordance 
with the Farkas-Minkowski lemma [33] this happens if 
and only if there is at least one infeasible loop, i.e. if 
there is a set C of intracellular reactions for which posi- 
tive constants fc, > exist such that 



(ii) Search, within such subset of reactions, for a loop of 
length L by looking for solutions to equation ( 12 ) 



iec 



kiUiS a i = Va 



(12) 



In presence of a loop, relaxation methods (MinOver in- 
cluded) do not converge, just because the least satisfied 
constraint moves along the loop causing the iteration to 
cycle indefinitely. More precisely, in presence of a loop 
the MinOver dynamics becomes periodic or almost peri- 
odic. For a periodic dynamics [i{£ + L) = fi(£) so that, 
using Q, for any I we have 



Mo 



r=l 



i (r+t) 



Va. 



(13) 



Comparing this with (12 1 it can be gathered that reac- 



tions updated over a period define a loop of length L 
(setting ki to the number of times constraint i has been 
updated). This suggests a simple way to correct configu- 
rations of reaction directions that are thermodynamically 
infeasible to start with: 

(i) While running the algorithm to compute chemical 
potentials for a large number of iteration steps T, 
keep track of the last say K least unsatisfied con- 
straints, i.e. store io(£) for £ = T — K + 1, . . . , T, 
with K a reasonably large number (e.g. 500), and 
count the number n of different reactions appearing 
in the series. 



with ki ^ for L reactions only, for all i-uples, 
starting from L = 3 and increasing L. 

(iii) If a loop is found, change the direction of one of its 
reversible reactions chosen with uniform probabil- 
ity. 

In the Results section we shall use this heuristics to spot 
loops and eliminate them in all of the infeasible configura- 
tions we shall generate for a large metabolic network (E. 
coli iAF 1260). For this network, it turns out that n < 50 
in all runs and that accounting for short loops (of length 
up to 6) suffices to correct all 10 5 randomly-generated 
configurations we tested. This is rather important since, 
in principle, step (ii) of the above procedure could be 
exponential in L. 

A computer code implementing the algorithms 
to compute chemical potentials and identify and 
remove infeasible loops is downloadable from 
|http://chimera.romal.infn.it/SYSBIO"71 



RESULTS 



Exploring the Gibbs energy landscape of the hRBC 
metabolic network 



As a first application, we have employed the MinOver 
scheme outlined above to analyze the thermodynamic 
landscape of the hRBC metabolic network. As a start- 
ing point, we have considered the flux configurations 
obtained in [T!5] and [5D] respectively by Monte Carlo 
sampling of the solution space of mass balance equa- 
tions (MBE) and by MinOver sampling of states com- 
patible with Von Neumann's constraints (VNC). In brief, 
MBE describe steady-state fluxes in terms of Kirchhoff- 
type laws enforced at metabolite nodes of the network as 
Sv = 0. In such a scenario, intracellular concentrations 
are clamped. VNC, instead, 'soften' the mass-balance 
equations by requiring that, for intracellular metabo- 
lites, Sv > 0. In the underlying steady state intracellular 
concentrations can grow in time if flux vectors allowing 
for it exist. Once the nutrient availability is set, VNC 
define a self-consistent flux problem where the system 
selects how much of the nutrients to use and, eventu- 
ally, which metabolites are globally produced. For intra- 
cellular metabolites, VNC correspond to a stability re- 
quirement since, in a dynamical setup, a violation of one 
of the inequalities implies the existence of a metabolite 
whose total amount used as input in metabolic processes 
exceeds the total amount returned as output (see e.g. 
[35l|36]). VNC are also closely linked to the metabolite 
producibility problem introduced in [37] ■ We shall thus 
make use of the reaction direction vectors u obtained in 
[HHU] for the hRBC. In summary: 

• according to [T!5], the net flux of all reactions is in 
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FIG. 2. Estimated log-concentrations of metabolites 
in the hRBC metabolic network. The input information 
used to initialize the algorithm (with error bars) is denoted by 
black markers (see text for details). Values obtained starting 
from direction assignments corresponding to a sample of 10 5 
MBE and VNC solutions are shown respectively as red and 
green markers. 



the forward direction, except PGI, R5PI and ApK, 
which are found to operate bidirectionally; 

• according to [2D], the net flux of all reactions is in 
the forward direction, except R5PI (which is found 
to be operating bidirectionally), and PGI and ApK 
(which are found to have a net backward flux). 

As a first step, we tested the thermodynamic feasibility of 
these direction assignments, solving ^ by starting from 
the vector /x^ r = 1 V/j. A solution is found for both MBE 
and VNC assignments. Both sets of assignments then 
turn out to be (expectedly) thermodynamically feasible. 
In this case, however, it is the emerging thermodynamic 
landscape that is of interest to us. As the initial distri- 
bution of chemical potentials Pq we selected a product 
of independent uniform distributions Pq. The Pq for 
compounds with known empirical bounds on concentra- 
tions were centered at a value (/i Q ) computed from the 
Gibbs energy of formation /io, Q and the estimated intra- 
cellular concentration c a under the hypothesis of a dilute 
solution, i.e. at 



fl 0y a + RT log c a 



(14) 



and were taken to span two standard deviations in con- 
centrations. The Pq for metabolites whose concentration 
estimates were unavailable (namely 6PGL, RL5P, X5P, 
R5P, S7P) were taken to be centered again at (14 1 but 
with c a = 1CT 4 M, and were assumed to span four orders 
of magnitude uniformly in the chemical potential scale. 
Finally, the chemical potential of water was clamped. 

Results for the estimated concentrations and Gibbs en- 
ergy changes (computed from ^ using the final chem- 
ical potential vectors) are showcased in Figures 2 and 
3, respectively. Note that several of the bounds en- 
coded in P (black markers) indicate that the Gibbs 
energy change in a reaction is positive and in contrast 
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FIG. 3. Estimated Gibbs energy changes of reactions 
in the hRBC metabolic network. The input information 
used to initialize the algorithm (with error bars) is denoted 
by black markers; the values obtained starting from direction 
assignments corresponding to MBE and VNC solutions are 
shown respectively as red and green markers. Note that the 
input information is consistent with reactions operating in the 
reverse direction for GAPDH, PGK, PGM, LDH, G6PDH, 
TA and PNPase. The algorithm is able to correct these in- 
consistencies starting from both MBE- and VNC-compatible 
direction assignments. 



with the reaction direction assignment from the flux 
problem. Specifically this happens for GAPDH, PGK, 
PGM, LDH, G6PDH, TA and PNPase, due either to 
the actual experimental estimates for the affinities or 
to the initial uncertainty we place on concentrations. 
Such bounds turn out to be altered by MinOver in a 
direction compatible with direction assignments based 
on mass balance, as final affinities display significant 
changes with respect to the picture embedded in Pq. At 
the same time, we observe that in some cases the fluc- 
tuations of [i a do exceed the initial boxes defined by 
Pq, leading to an estimate for the concentration range 
also for metabolites whose level has not been experimen- 
tally probed (6PGL, RL5P, X5P, R5P, S7P). Our pre- 
dictions for the levels of (l,3)-diphosphoglycerate ((1,3)- 
DPG), 2-phosphoglycerate (2PG) and phosphoenolpyru- 
vate (PEP) slightly differ from the experimental esti- 
mates. This is most likely a consequence of the fact that 
we are forcing the phosphoglycerate kinase (PGK) and 
the glyceraldehyde phosphate dehydrogenase (GAPDH) 
reactions in the forward direction, in agreement with the 
steady state direction assignments for glycolysis, even if 
the experimental values would classify them as reversible. 
In addition, we obtain levels of key metabolites like ATP 
and inorganic phosphate that differ slightly from exper- 
imental estimates, while our predictions for ADP and 
AMP fail under the MBE and VNC direction assign- 
ments, respectively. On one hand this could be due to 
errors in the prior information on standard free energies 
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FIG. 4. Estimated Gibbs energy changes of reactions 
in the hRBC metabolic network: comparison of dif- 
ferent algorithms. The results obtained starting from direc- 
tion assignments corresponding to MBE solutions are shown 
here for three different methods: (a) MinOver with A = 0.01 
(this paper, free markers); (b) a penalty method using the Eu- 
clidean distance between the solution and the prior as the cost 
function (blue markers); (c) a relaxation method optimized 
to be faster (red markers). For (b), we have set Ai = 0.001 
while A2 is initialized at 10 and grows in steps of 10 each 
time a minimum is found, until the configuration is feasible. 
These results show that MinOver roughly minimizes the av- 
erage Euclidean distance between the solution and the prior 
(as the penalty method), albeit with running times over 100 
times shorter (see text for details). 



but on the other hand, precise experimental estimates of 
the levels of such highly interchanging metabolites might 
be difficult to achieve. 

We can now quantify the extent to which the solutions 
we generate are "close" to the prior. In Figure 4 we com- 
pare the Gibbs energy changes obtained for the hRBC 
using MBE directions in three different ways: (a) the 
MinOver algorithm introduced here, with update given 
by Q; (b) a penalty method defined by the update rule 



H a — > fi a — Ai 



2(fi a - fi a (0)) + A 2 ^ 



iGUnsat 



(15) 



where Ai and A2 are constants and the sum extends over 
all i's such that Xi < 0; (c) a standard relaxation method 
defined by the update rule 



f a 



Soi. 



Va 



(16) 



In short, algorithm (b) minimizes the Euclidean dis- 
tance between the prior fi(0) and the solution under the 
thermodynamic constraint, which is enforced through the 
term proportional to A2 . The relaxation method (c) sim- 
ply corresponds to a particular choice of the constant A 
appearing in Q: in specific, the step size is required to be 



proportional to the amount by which the least satisfied 
constraint is violated. One sees that the penalty method 
and MinOver produce almost identical solutions. Mea- 
suring explicitly the average Euclidean distance between 
the prior and the solution (the average being taken over 
the choice of the priors), one indeed finds d ~ 15 KJ/mol 
(penalty method) versus d ~ 15.2 KJ/mol (MinOver), 
while d ~ 22.4 KJ/mol for the relaxation method (c). 
Note that the gap between the performance of MinOver 
and that of the penalty method (0.2 KJ/mol, correspond- 
ing to a relative error on d of just over 1%) provides 
a very rough estimate of the average distance between 
the solutions obtained by MinOver and those obtained 
by cost function minimization, and is much smaller than 
the spread of the initial configurations of chemical poten- 
tials, which in this case is about 15 KJ/mol. Comparing 
running times for this case, moreover, one sees that Mi- 
nOver is about 140 times faster than the penalty method 
(19 versus roughly 2700 seconds), while the standard re- 
laxation method is even faster (about 2.4 seconds). 



Identifying and removing thermodynamically 
infeasible loops in the Escherichia coli metabolic 
network model iAF1260 



The application that we have just discussed shows that 
the algorithm we present can provide information on the 
Gibbs energy landscape, even correcting inconsistent in- 
put knowledge. We shall now employ the procedure 
outlined in the Materials and Methods section to effi- 
ciently identify and eliminate loops from thermodynam- 
ically infeasible flux configurations of the reconstructed 
metabolic network of Escherichia coli iAF1260 We 
shall focus specifically on the periplasmic and cytoplas- 
mic core of the network, which presents the advantage 
that the cycles identified here are independent of the 
transport and environment selections. The network in- 
cludes 1767 reactions among 1349 chemical species. 

Since we are not focusing on the reconstruction of the 
Gibbs energy landscape but simply on the existence of so- 
lutions of j3|, a detailed biochemical prior is not needed. 
Therefore, for the present purposes we have taken Po(a*) 
to be product of identical uniform distributions. To be- 
gin with, we have fixed the direction of reactions that 
are putatively irreversible in the reconstructed network 
(1475 in total) and verified that this assignment is indeed 
thermodynamically feasible by finding a solution of ^ 
restricted to irreversible processes. Then we integrated 
the above assignments by fixing randomly and indepen- 
dently with equal probability the directions of the 292 
processes that are putatively reversible. A large ensem- 
ble of u vectors (10 5 instances) thus obtained was tested 
for thermodynamic feasibility. Note that by excluding 
the possibility that reactions are not operating we are 
considering a worst-case scenario in which all reactions 
bear a non-zero flux. Only about 1.5% of these config- 
urations turned out to be thermodynamically feasible, 
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FIG. 5. Histogram of the convergence times of the 
algorithm. Convergence times shown are for the identifi- 
cation and elimination of the thermodynamically infeasible 
loops and for the verification of thermodynamic feasibility of 
randomly generated flux configurations from the Escherichia 
coli iA1260 metabolic network model (on an Intel dual core 
at 3.06 GHz). 
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FIG. 6. Three of the 23 thermodynamically infeasi- 
ble cycles identified in the E. coli metabolic network 
iAF1260. Rectangles (resp. ellipses) denote metabolites 
(resp. reactions). The cycles depicted here are n. 8 (A, 
top left), 18 (B, top right) and 22 (C, bottom) from Table |T| 
The star indicates reversible reactions according to [21]. See 
Supporting Table SI for abbreviations. 



i.e. free from loops. We focus on infeasible instances, 
for which no vector \x of chemical potentials was found 
to satisfy applying to them the loop identification 
and removal protocol. Figure 5 shows the histogram of 
convergence times for the above procedure, i.e. the times 
required to verify that a given vector of flux directions 
is thermodynamically infeasible and to correct it. On an 
Intel dual core at 3.06 GHz the average CPU time for 
convergence is of the order of a few seconds, while it ex- 
ceeds 10 seconds in about 5% of the (random) instances. 
In the worst case within our ensemble, convergence time 
was around 100 seconds. 

We have furthermore studied the set of loops that 
were thus identified and corrected. Quite remarkably 
this analysis revealed that thermodynamical infeasibil- 
ity is related to the presence of a small set of cycles, 
23 in total. These are reported in Table [I] and three of 
them are depicted explicitly in Figure 6. Note that some 
of these cycles include a single reversible reaction. This 
implies that in order to ensure that such cycles will not 
be present in the final configuration it is necessary to 
fix the direction of the reactions SERt2rpp, GLUt2rpp, 
ACt2rpp, GLYCLTt2rpp, THRt2rpp, SUCOAS, PPAKr 
and PROt2rpp opposite to that shown in Table [I] (see the 
Supporting Table SI for abbreviations). In turn, this is 
easily seen to impose a further constraint on the direction 
of the reversible fluxes CAt6pp and GLUABUTt7pp (via 
cycles of length 4). Finally we checked that excluding 
these loops guarantees thermodynamic feasibility of 10 6 
randomly generated flux configurations 



DISCUSSION 

Ideally, constraint-based models of metabolic activity 
allow to appraise the energetic potential of cells based 
on minimal constraints related to local mass-balance and 
thermodynamic feasibility rules, possibly complemented 
with optimization principles that can encode for func- 
tional constraints. As a result, the flow of matter in 
non-equilibrium steady states could be characterized in 
terms of the Gibbs energy change of reactions, which 
specifies the directionality of interconversions, and of the 
average number of turnovers per time per volume, i.e. 
the flux, without the need of detailed information on en- 
zyme kinetics or transport mechanisms. Thermodynamic 
constraints, strongly linked to overall intracellular condi- 
tions like ionic strength and pH [38 , are particularly sub- 
tle and rich of consequences. It has indeed been argued 
that the Gibbs energy landscape contains important reg- 
ulatory information [fi. Reactions far from equilibrium 
are expected to be roughly insensitive to fluctuations in 
metabolite concentrations, so that they will be driven 
mostly by enzyme regulation. On the other hand, reac- 
tions close to equilibrium (i.e. with a net Gibbs energy 
change close to zero) bear a high sensitivity to variations 
in metabolite levels and are therefore unlikely targets for 
tight regulation. Besides, knowledge of reaction free en- 
ergies (or more precisely of the chemical potentials of the 
metabolites involved) provides clues on metabolite levels 
which would be hard to obtain from mass-balance con- 
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straints only. Therefore, developing effective procedures 
to deal with the complexity of flux models encompassing 
both mass and energy constraints at genome scale is a 
central challenge for computational systems biology. 

Many important steps have been taken recently to 
tackle it. At one level, thermodynamic feasibility can be 
translated into topologic constraints ('absence of loops') 
for the flux configuration emerging from mass-balance 
constraints |39j . This suggests than an improvement in 
reversibility assignments (e.g. along the lines of [53]) can 
be a key to ensure energy balance a priori in metabolic 
network reconstructions, with the caveat that the possi- 
bility that a reaction reverses can depend on the bound- 
ary conditions (e.g. the external supply of a certain 
metabolite) or on intracellular perturbations (e.g. a 
knockout causing the accumulation of an intermediate 
metabolite) [12]. Another possibility consists in build- 
ing mixed integer-linear constraint-based models that in- 
clude thermodynamic requirements in the form of consen- 
sus rules (using information on standard Gibbs free ener- 
gies) [13] or as additional constraints on metabolite levels 
(using information on measured intracellular concentra- 
tions) [12] . Here we have followed a different though re- 
lated route that in our view complements the approaches 
just described. The starting point is the fact that, given 
a flux configuration, thermodynamic constraints can be 
written as simple stoichiometric inequalities for the chem- 
ical potentials. Feasibility implies the existence of a vec- 
tor of chemical potentials satisfying such system. We 
have presented an algorithm that is able to construct 
solutions starting from a possibly limited and noisy bio- 
chemical prior. Our approach differs substantially from 
previous methods in that it relies on modifying the struc- 
ture of correlations between chemical potentials (after 
fixing some variables to set a scale) using the stoichio- 
metric information on reaction connectivity to drive the 
updating process. In this sense, the important difference 
with [7] is that the prior information is used only to ini- 
tialize the algorithm, and a large flexibility is allowed in 
deforming it until a viable solution is obtained. 

The usefulness of the algorithm in the analysis of 
genome-scale networks has been tested in two different 
cases. For the metabolic network of the human red 
blood cell, our approach has proved capable of recon- 
structing the Gibbs energy landscape correcting incon- 
sistent prior information. In turn, this has lead to pre- 
dictions for intracellular metabolite levels. It is impor- 
tant to stress that the bounds on concentrations we have 
obtained (which vary rather heterogeneously across com- 
pounds) only reflect stoichiometric information. For the 
metabolic network of Escherichia coli, instead, we have 
focused on the problem of correcting thermodynamically 
infeasible flux states in the core formed by the periplas- 
mic and cytoplasmic matrix. We have thoroughly ana- 
lyzed a large ensemble of configurations of reaction di- 
rections, identifying the cycles responsible for thermody- 
namic inconsistency and correcting them in a very mod- 



est amount of CPU time. Quite intriguingly, we have 
related infeasibility to the existence of a relatively small 
number of short cycles in the flux configuration, whose 
removal suffices to ensure thermodynamic feasibility in 
worst-case flux configurations. 

The main advantage of our method consists in our view 
in its efficient implementation. On the critical side, we 
point to two aspects that deserve further study. In first 
place, our tool requires flux configurations as inputs, i.e. 
it is still unable to produce thermodynamically feasible 
configurations of fluxes and chemical potentials start- 
ing from no previous reversibility hypothesis. However 
it may provide the basis of a more general procedure 
for the analysis of genome-scale metabolic networks that 
couples flux- and thermodynamic profiling, a challeng- 
ing open problem in computational biology. Secondly, 
our method relies on prior biochemical information and 
it would be desirable for it to be effective even if much 
or most biochemical priors are unknown. As we pointed 
out, some information has to be injected into the prob- 
lem for the sake of definiteness. The interesting question 
is therefore what is the minimum necessary prior needed 
to reconstruct the Gibbs energy landscape and how are 
predictions affected by restricted priors. Such problems 
are mathematical in nature and could bear a particularly 
high significance for modeling purposes. 

We remark that the corrected flux configurations thus 
obtained, like the starting ones (which were drawn from 
a uniform product measure over reactions), are not guar- 
anteed to be consistent with any steady state assump- 
tion. On the other hand, see Supporting Text SI, start- 
ing from a mass-balanced configuration one retrieves an- 
other mass-balanced configuration. Clearly, a method 
that directly generates thermodynamically steady state 
flux vectors and chemical potentials would be highly wel- 
come. Nevertheless, we note that our analysis focuses on 
a worst-case scenario where all reactions bear a non-zero 
flux. In a more realistic case where reactions may be 
inactive it is reasonable to expect that the flux direc- 
tions we generate will allow for a steady state. In turn, 
the identification of these loops might be weakly condi- 
tioned on the sample of flux assignments we employed. 
We can't rule out the possibility that assigning directions 
based e.g. on mass balance constraints provides a selec- 
tion criterion for flux configurations that differs substan- 
tially from the uniform measure we employed. However 
we expect that our sample allows to correctly identify 
loops involving at most log 2 S ~ 16 reversible reactions, 
S being the number of distinct direction assignments in 
our sample. Only loops that include a larger number 
of reversible processes might therefore play a role in a 
differently selected set of direction assignments. 

Acknowledgments — This work was supported by 
the DREAM Seed Project of the Italian Institute of 
Technology (IIT) and by the joint IIT/Sapienza Lab 
Nanomedicine. The IIT Platform "Computation" is 
gratefully acknowledged. 



10 



Cycle ID 


Lenght 


Formula 


1 


3 


SERt4pp + NAt3pp - SERt2rpp(R) 


2 


3 


NAt3pp + GLUt4pp - GLUt2rpp(R) 


3 


3 


NAt3pp - ACt2rpp(R) + ACt4pp 


4 


3 


NAt3pp - GLYCLTt2rpp(R) + GLYCLTt4pp 


5 


3 


PROt4pp - PROt2rpp(R) + NAt3pp 


6 


3 


HPYRRx - TRSARr(R) - HPYRI(R) 


7 


3 


CRNDt2rpp(R) - CRNt2rpp(R) + CRNt8pp 


8 


3 


VPAMT - ALATAL(R) + VALTA(R) 


9 


3 


ABUTt2pp + GLUABUTt7pp(R) - GLUt2rpp(R) 


10 


3 


NAt3pp- THRt2rpp(R) + THRt4pp 


11 


3 


ADK3(R) - ADK1(R) + NDPKl(R) 


12 


3 


ACCOAL + PPCSCT - SUCOAS(R) 


13 


3 


PPAKr(R) + ACCOAL + PTA2 


14 


4 


ACt4pp - CAt6pp(R) - ACt2rpp(R) + CA2t3pp 


15 


4 


CA2t3pp - GLYCLTt2rpp(R) + GLYCLTt4pp - CAt6pp(R) 


16 


4 


SERt4pp - CAt6pp(R) - SERt2rpp(R) + CA2t3pp 


17 


4 


GLUt4pp - CAt6pp(R) + CA2t3pp - GLUt2rpp(R) 


18 


4 


CA2t3pp - PROt2rpp(R) + PROt4pp - CAt6pp(R) 


19 


4 


THRt4pp - CAt6pp(R) - THRt2rpp(R) + CA2t3pp 


20 


5 


ADK1(R) - ACKr(R) + ACS + PTAr(R) - PPKr(R) 


21 


5 


R15BPK - PPM(R) - PRPPS(R) - ADK1(R) + R1PK 


22 


6 


R1PK - NDPKl(R) - PPM(R) - PRPPS(R) + R15BPK - ADK3(R) 


23 


6 


ADK3(R) - ACKr(R) + PTAr(R) + NDPKl(R) - PPKr(R) + ACS 



TABLE I. Thermodynamically infeasible cycles for the E. coli metabolic reaction network iAF1260. The list 
provides the complete set of cycles turned out in the thermodynamic feasibility analysis of a sample of 10° different randomly 
generated flux configurations. Plus (resp. minus) signs indicate that the reaction participates in the cycle in its forward 
(resp. backward) direction. (R) indicates that the corresponding reaction is putatively reversible according to [21]. Analyzing 
interdependencies in the above cycles, one sees that the directions of the putatively reversible fluxes SERt2rpp, GLUt2rpp, 
ACt2rpp, GLYCLTt2rpp, THRt2rpp, SUCOAS, PPAKr, PROt2rpp, CAt6pp and GLUABUTt7pp are in fact constrained. See 
Supporting Table SI for abbreviations. 
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